This summarizes transcript leader/5’UTR lengths in different (annotations of) Saccharomyces species, from yeast genome (YG) and otherwise from Spealman et al 2019 (SM).
## # A tibble: 27,412 x 22
## Gene aATG.context Length d1.context d1.posTSS d1.posATG d1.frame
## <chr> <chr> <int> <chr> <int> <int> <int>
## 1 YAL0… gccacaagaaa… 1 GTTCGTGCT… 153 152 2
## 2 YAL0… attacacctag… 1 gaATGGAGC… 11 10 1
## 3 YAL0… TATACACACAT… 51 AGAATTCTC… 200 149 2
## 4 YAL0… ttcaccaccca… 1 ATACCGTCT… 47 46 1
## 5 YAL0… ATAACAGATAA… 63 CTCACTTTG… 124 61 1
## 6 YAL0… TAAAGGAAAAC… 76 CATCTTCCA… 161 85 1
## 7 YAL0… AATAGGTGTAA… 98 TTGGCTTTT… 116 18 0
## 8 YAL0… ataaaggaggt… 1 AGAGCATAG… 23 22 1
## 9 YAL0… AGACCGATCTT… 42 ATGCTACCC… 54 12 0
## 10 YAL0… agacaagtaaG… 3 GTGGTCGTG… 106 103 1
## # … with 27,402 more rows, and 15 more variables: d2.context <chr>,
## # d2.posTSS <int>, d2.posATG <int>, d2.frame <int>, u1.context <chr>,
## # u1.posTSS <int>, u1.posATG <int>, u1.frame <int>, u2.context <chr>,
## # u2.posTSS <int>, u2.posATG <int>, u2.frame <int>, Organism <fct>,
## # uATGCt <int>, uATGCtmin20 <int>
## # A tibble: 5 x 7
## Organism uAUGtot Lengthtot Nlong Ntmeas Ntot NuAUG
## <fct> <int> <dbl> <int> <int> <int> <int>
## 1 cer_YG 936 234564 2835 6571 6571 504
## 2 cer_SM 2904 369593 2943 6446 6460 721
## 3 kud 1052 214669 2293 4780 4780 364
## 4 par 1047 214672 2322 4855 4855 397
## 5 uva 985 202024 2222 4636 4746 370
## # A tibble: 5 x 6
## Organism `25%` `5%` `50%` `75%` `95%`
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 cer_YG 39 19 60 97 227.
## 2 cer_SM 28 9 53 130 551.
## 3 kud 27 10 50 103 325.
## 4 par 28 10 49.5 100 330.
## 5 uva 28 11 49 97 314
Proportion of TLs containing uAUGs only includes the TLs with non-zero length, i.e. treating the others as missing data.
We compared transcript leader length (A), proportion of transcript leaders containing uAUGs (B), and density of uAUGs in transcript leaders (C) between annotations of Saccharomyces yeasts. Annotations are abbreviated as: cer_YG, S. cerevisiae S288C from the saccharomyces genome database (Cherry et al 2013); cer_SM, S. cerevisiae S288C from Spealman et al. (2018); kud, S. kudriavzevii FM1340, par, S. paradoxus CBS432, uva, S. bayanus var. uvarum JRY9191, the latter 3 also from Spealman et al. Note that all annotations show a similar median leader length of 48-60 nucleotides.